Stochastic Variables

The aim of the HydroRL model is to maximize the expected profits from generating electric power, given the uncertainty of inflow and energy price. In order for the model to learn, it interacts with the hydro system environment where it experiences different outcomes of the stochastic variables. The learned solution is thus highly dependent on how the stochastic variables are modeled, which we will outline in following.

Background

In order to provide an illustrative example, we will use some historical energy prices and inflows. We rearrange the data in order to obtain eight different scenarios that comprise our fictive forecast for what the inflow and energy price will be the next two years, given by the illustrations below.

Time index

We resample the forecast to a required time resolution. Say we want to train the model on a monthly basis, we get the following inflow and energy prices after resampling.

The Markov Chain

The HPS RL model requires a significant amount of training data. We generate the training data by sampling from a Markov chain that is generated from the resampled inflow and energy price data. The Markov chain is defined as

(1)\[\begin{equation} P(p_t = \chi^j_t \vert p_{t-1}=\chi^i_{t-1})=\rho_{ij}(t), \forall i, j \in M(t), \end{equation}\]

where the transition probability, \(\rho_{ij}(t)\), represents the probability of transitioning from node \(i\) in time stage \(t-1\) with uncertain data \(\chi^i_{t-1}\) to node \(j\) in time stage \(t\). \(p_t\) represents the realized data in stage \(t\). \(M(t)\) defines a set of nodes in time stage \(t\).

The examples below illustrates a Markov chain with four nodes per time stage for the inflow and energy price data above. Note that the size of the markers illustrate the unconditioned weight of the nodes in a given time stage.

The shaded area denoted by 1 std. dev represents the one standard deviation between the nodes in the Markov chain and the underlying data. Thus giving an indication on how well the Markov chain fits the underlying data. There are three different configuration one can choose when sampling from the Markov chain. They are defined by setting the noise value to either “Off”, “White” or “StandardDev” in the RunSettings object of the Web API. Their behavior is defined as

  • “Off”

    • The sampled values from the Markov chain is given by the value of the nodes.

  • “White”

    • White noise (\(N(0,1)\)) is added to the sampled values from the Markov chain.

  • “StandardDev”

    • Noise defined by \(N(0,\sigma^2)\) is added to the sampled values from the Markov chain.

Sine adding noise to the Markov chain values might be non-positive the values are clipped by some bound. The bounds are given such that the inflow has to be non-negative and the expected inflow is not shifted. The energy price is clipped such that only non-negative values are observed.

Below is an illustration with 10 sampled values with the noise parameter set to “StandardDev”.